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Abstract — Subtractive dithered quantizers are examined to 
minimize the signal-band dither power. The design of finite 
impulse response(FIR) filters that shape most of the dither- 
power out of the signal band while maintaining the benefits of 
dithering are dealt with in detail. Simulation results for low- 
medium resolution quantizers are presented to highlight the 
overall design consideration. 



I. Introduction 

Quantizers are the portals to digital signal processing of 
all real-world signals and hence serve as the main interface 
between natural and machine-based signal processing. The 
main purpose of a quantizer is to represent signals in a form 
that is easily operable, easy to store in digital computers. An 
example mid-tread quantizer is shown in Fig. 1. The quantizer 
is said to not overload if \z[n] \ < QA/2 (note that throughout 
this paper we shall not make any distinction between signals 
z\l] and zi where I denotes the time-index) where Q denotes 
the number of output levels (here Q — 5). As can be seen 
from Fig. 1, the input-output characteristic of any example 
quantizer, is evidently non-linear and hence signals when 
quantized produce spectral content hitherto absent in them 
|[T]-(3]> |5j. Of particular interest are sinusoidal signals |3), 
which are composed of discrete tonal frequencies. Such signals 
when passed through quantizers give rise to spurious tones 
corrupting the output signal spectrum. There is a rich body 
of literature attempting to find the resulting quantization error 
statistics and spectrum (TJ-|3J, |5j. A major understanding 
from all these works is that the input signal to the quantizer 
needs to be equipped with certain statistical properties in order 
to ensure that the quantization error samples are independent 
and uniformly distributed, a consequence of the latter being 
the ubiquitous A 2 /12 (A being the quantization step size) 
quantization error power. In most practical scenarios though, 
it is highly infeasible to handle signals with the required 
statistical properties (in fact for behavioral simulations, de- 
terministic input signals are considered, viz. sinusoids which 
render the quantization error completely deterministic, given 
the quantizer characteristic). So, intuitively, a small signal, 
random in nature (called dither) is added to the input in order 
to make the composite signal samples unpredictable at a given 
time. In other words, the composite signal is constrained to 
possess the statistical properties outlined in (3). Let us take a 
more formal view of quantization after dithering. 
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Fig. 1: Mid-tread quantizer 



A. Dithered quantization 

A random signal r[n} is added to the signal to be quan- 
tized x[n] and the composite signal z[n] = x[n] + r[n] is 
passed through the quantizer as shown in Fig. 2. There is 
a subtle difference though, between Fig. 2(a) and (b). In 
Fig. 2(b), the added dither signal r[n] is subtracted digitally 
from the quantized value y[n] and hence is called a sub- 
tractively dithered quantizer. Likewise, Fig. 2(a) refers to a 
non-subtractively dithered quantizer (commonly phrased as 
additive dithered quantizer). The added dither, r[n) is usually 
constrained to be bounded between one least significant bit 
(LSB) of the quantizer. Separate conditions BJ have been 
theoretically derived for either case to ensure that the error- 
samples (e[n] = x[n] for Fig. 2(a) and e[n] = y[n] — z[n] 
for Fig. 2(b)) are independent and uniformly distributed both 
in terms of first and second order statistics, formally 

• e n is uniformly distributed. 

• (e n ,e n - p ) are pairwise independent and uniformly dis- 
tributed Vp € Z - (0). 

• e„ is independent of x n - m \/m eZ - (0). 

From the conditions outlined in BJ, it becomes evident that 
the properties listed above are a lot more likely to hold for 
a subtractively dithered quantizer than an additive one. In 
fact, for the latter, it can be shown that the error samples 
e[n] are never statistically independent of the input samples 
x[n— p] Vp G Z. On the other hand, for a subtractively dithered 
quantizer, the main conditions imposed on the added dither 



r[n] for the above conditions to hold are: 



$r„(u)|„=* =0 



,(wi,U 2 ) 



|ui = 



fcGZ - (0) 
, = 



fel,fc 2 G Z 2 - (0,0) 



(1) 



where § w (it) is the characteristic function(cf) of the random 
variable w and , iuj^i, u 2 ) is the joint characteristic 
function(jcf) of random variables u>i and w 2 ■ 

Unfortunately, such r[n] would contribute too much noise 
to the quantizer output. In fact, a uniformly distributed dither 
signal would degrade the overall signal-to-noise ratio(SNR) by 
3dB. Furthermore, it may be impossible or at least extremely 
challenging to digitally generate such dither. The immediate 
solution to such a problem is to spectrally shape the dither 
energy out of the signal band of interest p). Such an ar- 
chitecture is presented in Fig. 2(c) as an extension of Fig. 
2(b). However, filtering a signal tantamounts to modifying its 
statistical properties. Consequently, the filtered signal r[n] in 
Fig. 2(c) may not possess the properties outlined in (1). There 
have been some very interesting works treating filtered dither 
signals and their efficacies in whitening the quantization error, 
notable among which are |4j, |6j, (7j. With reference to Fig. 
2(c), in Q, a detailed analysis is done on the properties of r[n] 
where djn]'s are i.i.d. random variables. However, the analysis 
is specific to additive dithered quantizers and imposes very 
strict conditions on the filter-coefficients (FIR or IIR). In (7), a 
simplified condition is derived for FIR filters to ensure that the 
error-samples possess the properties outlined in (1). However, 
the quantizer treated in [7|, works on integer values only and 
also the whitening conditions are derived only for error-sample 
pair that are apart from each by at least the filter length. The 
work in |6j also provides conditions for the impulse response 
of the IIR filter (integrator in feed-forward path of a sigma- 
delta modulator) to ensure (1). In this work, we provide an 
alternative technique to digitally filter (using a FIR filter) a 
bi-valued dither signal, which enables the dither signal to span 
a finite number of values within the coarse LSB pushing most 
of the dither energy to frequencies where the input signal has 
no or negligible content. We theoretically derive conditions 
for achieving complete whitening of the error sequence. In 
the next section, we put forth the proposed technique and 
investigate conditions for (1) to hold. In Section III, we 
simplify some of the conditions from Section II to ensure 
almost whitening of the error signal for a continuous-valued 
error-signal. In Section IV, we furnish pertinent simulation 
results to support our claim and we conclude the paper in 
Section V. 

II. Proposed Technique 

With reference to Fig. 2(c), let us define a Bernoulli 
sequence d[n] that follows the statistics:Pr(d[n] = 0) = 
Pr(d[n] = 1) = 0.5. The sequence d[n] is passed through 
a digital filter G(z) having a finite impulse response g[n] 
of length K to produce an output r[n]. The filter gain is so 



adjusted that the output r[n] is in [—A/2, A/2]. Consequently, 
the filtered output r[n] can be expressed as 

r[n] = — {g[0]d[n] + g[l]d[n - I] + + g[K — l]d[n — K + 

L 

(2) 

where L is the C\ norm of the filter g. The quantity A/L 
can be thought of as the dither LSB (the minimum resolution 
of the added signal r n ). 

Note: It should be pertinent to observe here that since the 
added filtered dither signal r„ has a finite resolution, namely 
A/L, hence any input signal x n below this resolution will not 
experience any whitening action. In the following arguments, 
we shall assume that the input signal x n is sufficiently large 
than the dither LSB so that any correlation arising due to the 
input signal residing between the dither steps, is negligible. 

We propose the following theorem to ensure an almost white 
error sequence. 

Theorem 1: Suppose the input to a non-overloading Q-level 
quantizer is z n = x n + r n where r n — g n * d n and x n is a 
bounded sequence i.e. x n € [— (Q — l)A/2, (Q — l)A/2] for 
a sample mid-tread quantizer. Let (U, V) be two independent, 
uniformly distributed random variables in (— A/2, A/2]. For 
all (h,k 2 ) G (-L/2.L/2] 

i) e n is independent and identically distributed uniformly 

ii) (e„, e„_ p ) converges in distribution to (U, V) Vp G Z — 

(0) iff at least one of the following conditions hold: 

1) A non-negative integer I < p exists such that (giki/L = 
L/2 

2) A non-negative integer 1 < r < p exists such that 

(gK-rki)L = L/2 

3) A non-negative integer p < m < K exists such that 
(g m ki + g m -pk 2 )h = L/2 

where ()t operator denotes modulo-T operation. 
Remark: We shall prove Property (ii) above to derive con- 
ditions (1), (2) and (3) and then lead to the proof of property 

(1) as a simplified subset. 

Proof: The proof would use characteristic functions to 
derive conditions on the specific properties of the added 
dither signal. This is a commonly used technique for such 
applications ||7). In fact, from p), we know, that the joint 
characteristic function for error-samples (e„,e n _ p ) can be 
written as 
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Hence, for the joint density of (e„,e n _ p ) to converge to 
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Fig. 2: Dithered Quantizers: (a) Non-subtractive/Additive (b) Subtractive (c) Filtered-subtractive 



(U, V), it suffices to show (3j 
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Now, since r[n] takes on only finite values in the set A = 

(-A/2, -A/2+A/L, -A/2+2A/L, , A/2], we can write 
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Substituting in (5), 
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Clearly, the RHS of (6) is L-periodic in (k±, k 2 ). In essence, 
L 2 number of jcf's should be accounted for, to ensure the 



condition given in (4) is true. 
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For a Bernoulli dither d„, with Pr(d„ = 0) = Vr(d n = 1) = 
0.5, $ d (u)(cf of d n )= e 1 --^/^ cos(w/2). Thus, we can write, 
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So, based on conditions (l)-(3), the above result goes to 
zero V(fei,fc 2 ) £ (0,0). ~ ■ 

The proof for condition (i) follows on similar lines as above. 
For the sake of completeness, we proceed as shown. 



Proof: i) From [1], 
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Thus, for the LHS of (9) to converge to that of a uniform 
random variable, it is sufficient to show that 
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2) L= \\g\U = 2 S 
where s E Z > 1 
Proof: 

Here also, we shall start from property (ii) and then lead 
to the proof for property (i). As pointed out in the proof for 
Theorem 1, in order to prove (ii), it suffices to prove 

*rr (^,=^) = 
r n ,r n - p \ A A ' 

V(fci,fe 2 ) eZ 2 - (0,0) 
Again, from (6), this amounts to proving, 
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where * denotes the convolution operation. 

From condition (1) (or (3)), letting \p\ > K, one of the 
above product term goes to zero Vfc e Z — (0). ■ 

Remarks: From conditions (l)-(3) in Theorem 1, it may not 
be always possible to find FIR filter coefficients which imparts 
an appreciable in-band dither energy suppression as well as to 
satisfy the enumerated conditions. In the following section, we 
propose another theorem which ensures almost whiteness 

III. Simplified conditions for approximate 

WHITENESS 

Theorem 2: Suppose the input to a Q-level non-overloading 
quantizer is z n = x n + r n where r„ = g n * d n and is 
a bounded sequence i.e. x n £ [—(Q — l)A/2, (Q — l)A/2]. 
Let (U, V) be two independent, uniformly distributed random 
variables in (-A/2, A/2]. 

i) e„ is identical and independently (uniform) distributed 
and 

ii) (e n , e n - p ) converges in distribution to (U, V) for all \p\ > 
if iff 

1) The FIR filter coefficients g[k] are of the form 2 l where 
i £ [0, s — 1] at least once and 

1 As noted before, it is assumed that x„ is sufficiently greater than the dither 
LSB A/L in amplitude. 
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Now it becomes useful to consider the following cases, 
V(fei,fc2) G [— L/2 + l,L/2] assuming the conditions in 
Theorem 2 hold. 

• k\ = odd, &2 = °dd One product term of the right-hand 
side of (13) can be written as cos(7rfcj |j), j = 1,2. Hence 
for r = s— 1, we can write the product term as cos(^fcj) 
which goes to since kx^ are odd. 

• fci = odd, &2 = even Here, fci will drive the product 
term to for r = s — 1. The symmetric case of fc 2 = 
odd, ki = even similarly can be shown to equate to 0. 

• k± = even, k 2 — even Here, let k 12 — 2 l (2m + 1), I < 
s — 1 for any integer to. Then the product term containing 
r = s — 1 — I would yield cos( : |(2m + 1)) which again 
goes to 0. 



Property (i) of Theorem 2 follows, almost directly, from the 
proof above, and hence is not given here for brevity. 

IV. Simulation Results 

We present here simulation results pertaining to the sim- 
plified conditions from Theorem 2, since as noted previously, 
it may not always be possible to impart appropriate high-pass 
shape to the dither signal satisfying the conditions of Theorem 
1. Let us consider two example filters, 



(a) error pmf for G\ 



(b) error pmf for G2 
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Fig. 3: Comparison of Gi and G 2 



Gi{z) = 1 - + 5z~ 2 - 9z~ 3 + 3z~ 4 - 3z~ 5 + 9z~ 6 - 

+ 3z- 8 -z- 9 
G 2 (z) = -1 - 2Z- 1 - Az~ 2 - 8z~ 3 + 16z~ 4 - z~ 5 

Verifying, we find G\(z) satisfies neither of the two con- 
ditions of Theorem 2, while G2(z) satisfies both. The input 
x[n] is chosen to be a sinusoid with an amplitude of 2 A. The 
signal is quantized into Q = 5 levels as in Fig. 1. In Fig. 
3(a),(b), we plot the pmf of the error sequence e„ for both 
the cases, while Fig. 3(c)(d) shows the spectra of the error 
signal. As can be clearly seen, the proposed filter, namely 
G 2 , whitens the error-sequence and exhibits an almost uniform 
pdf (Fig. 3(b)) while G\ shows an almost triangular pmf(Fig. 
3(a)) for the error samples. The power spectral densities also 
provide information to that end. The error psd for G 2 (Fig. 
3(d)) is white, while the error psd for Gi exhibits multiple 
spurious tones at harmonic frequencies (as is expected from a 
lookup table type non-linearity) (Fig. 3(c)). In order to make 
a fair comparison, a third case where a uniform dither signal 
r[n] (the case in Fig. 2(b)) is added to the input signal before 
quantizing, is also considered. The spectra of y[n] is plotted 
for all the three cases: G\ , G2 and uniform dither in Fig. 4. 
As can be seen, the uniform dithered quantizer contributes the 
maximal in-band power while whitening the output spectrum 
completely. G 2 shapes the in-band dither power, as well as 
gets rid of any spurious components, while Gi has the least in- 
band dither power contribution but engenders harmful spurious 
tones at the quantizer output. 

V. Conclusion 

A dithering technique in quantizers is proposed. The tech- 
nique relies on FIR filtering of the dither signal minimizing 
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Fig. 4: Spectrum of y[n) for three different scenarios 



in-band SNR corruption. Theoretical conditions on the filter 
structure are derived to ensure whitening of the quantization 
error signal. Behavioral simulation results are presented to 
corroborate the proposed technique and claims. 
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